100 random samples

The population is 24 mice, 12 of each sex and 12 of each POE, with the same genetic composition as a result of crossing. Genes where at least three samples do not have 10 counts were removed. Genes without at least one count for both alleles were removed. Genes with a significant sex or parent effect was removed.

We constructed 100 random samples (sample runs) of size 6 from the population such that the same sample never appeared twice. For each random sample, we calulated the MLE, apeglm and ash estimates, and the MLE of the leftover 18 mice was taken to be the truth. We define shrinkage as movement from MLE to zero, and improvement as movement from MLE to truth. So if the apeglm estimate of an allele has shrinkage=0.1 and improvement=0.05, that means the apeglm estimate is 0.1 closer to 0 than the MLE, and 0.05 closer to the truth than the MLE. We define a gene as shrunk if shrinkage>0.1.

We see that mean absolute error and median concordance of top 500 genes was better for apeglm than ash and MLE. More importantly, while apeglm tended to give better estimates for shrunk genes, ash tended to give worse estimates. Ash also shrinks more heavily than apeglm across sample runs (results not shown), suggesting that ash is overshrinking. We looked at concordance of top 500 genes and not a lower number, such as top 50 genes, because many alelic proportions were very small, and our C++ code caps per-sample proportion estimates at 0.001. For instance, the tenth largest gene was 0.001, meaning our C++ code is not getting the MLE for the top 10 genes. As investigators do not wish to distinguish between allelic proportions of 0.01 and allelic proportions of 0.0001, concordance of the top 10 or even top 100 genes isn’t as important. It is worth noting, however, that apeglm beats the MLE and ash when looking at concordance of top 10, 50 and 100 genes as well.

## summary statistics for Mean Absolute Error for MLE, apeglm, ash (in that order) across sample runs
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.09842 0.10659 0.11005 0.11227 0.11760 0.13569
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.09365 0.09933 0.10210 0.10227 0.10546 0.11307
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1156  0.1278  0.1307  0.1315  0.1357  0.1507
## summary statistics for Mean Absolute Error for MLE and apeglm, for apeglm shrunk genes (shrinkage>0.1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4922  0.5365  0.5558  0.5545  0.5705  0.6310
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4319  0.4609  0.4768  0.4778  0.4889  0.5573
## summary statistics for Mean Absolute Error for MLE and ash, for ash shrunk genes (shrinkage>0.1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3902  0.4233  0.4360  0.4375  0.4493  0.4992
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5381  0.5714  0.5860  0.5867  0.5982  0.6280
## median improvement for shrunk genes, apeglm and ash (in that order)
## [1] 0.06055599
## [1] -0.05004853
## top 500 concordance for MLE, MAP, ash (in that order) across sample runs
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8980  0.9200  0.9240  0.9249  0.9300  0.9520
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8920  0.9080  0.9180  0.9157  0.9220  0.9420
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6700  0.7600  0.7800  0.7787  0.8005  0.8580

case study

Below are estimate vs. true, MA and CAT plots for one of the random runs above, as well as quartiles of shrinkage for apeglm and ash. We can see that ash shrinks more heavily compared to apeglm and is overshrinking, similar to my prior analysis when looking at the real dataset. There was no optimal filtering rule that leads to the MLE beating or matching apeglm in a CAT plot. We can also see that, as the filtering threshold becomes more strict, concordance decreases. Further investigation revealed that the largest true effects have very small counts. This is because the “true effects” is just the MLE of a leftover set of 17. It is likely that the REAL true effects are much smaller than the MLE of the leftover set, which has very large (or very small) effects due to high variance of the MLE induced by small counts. This also means that concordance evaluations for the real data set that have been discussed here should be taken with a grain of salt.

We define a gene’s total counts as the sum of the gene’s counts across samples.

## quantiles of shrinkage for apeglm (top) and ash (bottom)
##         50%         75%         90%         95%       97.5%         99% 
## 0.006779858 0.021649752 0.094550367 0.196088437 0.342135436 0.629298608 
##       99.5% 
## 3.078815221
##        50%        75%        90%        95%      97.5%        99% 
## 0.01963911 0.04654791 0.14356885 0.28641230 0.51513635 1.56157790 
##      99.5% 
## 6.25538157

## summary statistics of the gene-wide total counts of our leftover data
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      78    1960    6956   19635   18914 1529935
## smallest 50 gene effects of leftover population with associated gene-wide total counts
##    leftoverEffect totalCounts
## 1       -7.192132         569
## 2       -7.113289         397
## 3       -7.055602         244
## 4       -7.046056         223
## 5       -6.360247        2154
## 6       -6.239011         904
## 7       -6.216214        1819
## 8       -6.178062         802
## 9       -6.152260         729
## 10      -6.085572        4249
## 11      -5.824845        1177
## 12      -5.822200         536
## 13      -5.659269        3119
## 14      -5.578971        3285
## 15      -5.514936         366
## 16      -5.482757         354
## 17      -5.448093         755
## 18      -5.423339         349
## 19      -5.368445         295
## 20      -5.317960        1034
## 21      -5.297017        2411
## 22      -5.273536         284
## 23      -5.266839         294
## 24      -5.240560         270
## 25      -5.210501         882
## 26      -5.192906         515
## 27      -5.183641         866
## 28      -5.157452         552
## 29      -5.131816        1431
## 30      -5.116540         242
## 31      -5.113956         241
## 32      -5.108449         752
## 33      -5.006954        4992
## 34      -4.964991        2191
## 35      -4.915666         195
## 36      -4.915542         176
## 37      -4.901752         186
## 38      -4.893111         272
## 39      -4.881159         406
## 40      -4.863504         199
## 41      -4.837534         163
## 42      -4.802772         256
## 43      -4.794868         404
## 44      -4.746555         672
## 45      -4.743060         169
## 46      -4.731371         170
## 47      -4.726384         165
## 48      -4.720318         540
## 49      -4.714926         157
## 50      -4.669534         134